Logistic Regression and Marginal Effects

Intro

Logistic regression is everywhere in applied data science — binary outcomes, classification problems, probability estimation. Most people know how to fit one. Fewer know how to interpret it properly, and even fewer know what to do when they want to talk about effects on the probability scale rather than the log-odds scale.

This post is about that gap.

The basics

Logistic regression models the log-odds of a binary outcome as a linear function of predictors:

\[\text{logit}(P(Y=1)) = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p\]

Where:

  • \(P(Y=1)\) is the probability of the outcome occurring.

  • \(logit (P(Y=1))\) is the log-odds of the event occurring.

  • \(\beta_0, \beta_1, \beta_2, ..., \beta_p\) are the model coefficients.

  • \(X_1, X_2, ..., X_p\) are the predictor variables.

Interpreting Coefficients in Logistic Regression

The coefficients tell you about effects in log-odds space. A positive \(\beta\) means an increase in that predictor increases the log-odds of the outcome. Negative means the opposite. The magnitude tells you how strong that association is.

  1. Sign: A positive coefficient (\(\beta > 0\)) means an increase in the predictor is associated with a higher probability of the outcome. Negative works the other way.

  2. Magnitude: Larger absolute values indicate stronger associations. But “larger” is relative to the scale of your predictor — a coefficient of 2 means something very different for a predictor measured in kilometers versus one measured in meters.

  3. Statistical significance: Standard errors and p-values tell you whether the estimated coefficient is distinguishable from zero given your sample size. Statistical significance and practical significance are different things.

Practical Example

Say you’re modeling whether a customer makes a purchase. Two predictors: time spent on the site and number of items added to cart. Estimated coefficients:

  • \(\beta_{\text{Time Spent}} = 0.05\)

  • \(\beta_{\text{Number of Items}} = 0.2\)

Interpretation

  • \(\beta_{\text{Time Spent}} = 0.05\): Each additional minute on site is associated with an increase of 0.05 in the log-odds of purchasing. The direction makes sense; the magnitude requires context.

  • \(\beta_{\text{Number of Items}} = 0.2\): Each additional item in the cart is associated with an increase of 0.2 in the log-odds of purchasing. Stronger effect than time spent, which also makes sense.

Interpretation Issues

Here’s where most introductions to logistic regression stop, and where a lot of applied work goes wrong.

Log-odds are not intuitive. Most stakeholders — and most data scientists, honestly — think in probabilities. So the natural move is to convert: exponentiate to get odds ratios, maybe convert those to probabilities. The problem is that probabilities don’t behave linearly the way log-odds do.

On the probability scale, the effect of a one-unit increase in \(X_1\) is not constant — it depends on the current value of all the other predictors. Moving from a 5% to a 10% probability is very different from moving from a 45% to a 50% probability, even if both correspond to the same change in log-odds.

So if someone asks “by how much does adding an item to the cart increase the purchase probability?”, the honest answer is: it depends on where you are on the probability scale.

Marginal Effects

This is what marginal effects address. A marginal effect is the change in the outcome probability associated with a one-unit change in a predictor, holding all other predictors constant — computed on the probability scale.

NOTE: In linear regression, the regression coefficients are directly the marginal effects. In logistic regression, they’re not.

The complication is that the probability scale is non-linear. The derivative of the logistic function isn’t constant — it varies with the value of the linear predictor. So we need to estimate the derivative numerically.

Finite differences

Differential calculus gives us the concept of the derivative: the instantaneous rate of change of a function at a point. For a function \(f(x)\), the derivative \(f'(x)\) is defined as:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}\]

When we can’t compute this analytically — because we don’t have a closed-form expression for the derivative, or because we’re working with a black-box model — we use finite differences: compute the function at two nearby points and divide by the step size.

\[f'(x) \approx \frac{f(x + h) - f(x)}{h}\]

For higher-order derivatives, the second difference is:

\[f''(x) \approx \frac{f(x + h) - 2f(x) + f(x - h)}{h^2}\]

For logistic regression, this means: bump \(X_1\) up by a small amount (say, 0.001), recompute the predicted probability, take the difference, and divide by the step size. Do this at each observation. Average across the sample to get the Average Marginal Effect (AME).

The AME is interpretable: “on average, a one-unit increase in number of items added to cart is associated with a 12 percentage point increase in purchase probability.” That’s a statement in the language people actually use.

Conclusion

The gap between what logistic regression coefficients tell you (effects in log-odds space) and what most people want to know (effects on probability) is real and worth bridging. Marginal effects computed via finite differences give you that bridge cleanly, without requiring strong parametric assumptions about the shape of the effect.

In R, the margins package and marginaleffects handle this. In Python, statsmodels has get_margeff() on fitted logit models. For everything else, the numerical approach is straightforward to implement manually.